Audio-Video Sensor Fusion with Probabilistic Graphical Models

نویسندگان

  • Matthew J. Beal
  • Hagai Attias
  • Nebojsa Jojic
چکیده

We present a new approach to modeling and processing multimedia data. This approach is based on graphical models that combine audio and video variables. We demonstrate it by developing a new algorithm for tracking a moving object in a cluttered, noisy scene using two microphones and a camera. Our model uses unobserved variables to describe the data in terms of the process that generates them. It is therefore able to capture and exploit the statistical structure of the audio and video data separately, as well as their mutual dependencies. Model parameters are learned from data via an EM algorithm, and automatic calibration is performed as part of this procedure. Tracking is done by Bayesian inference of the object location from data. We demonstrate successful performance on multimedia clips captured in real world scenarios using off-the-shelf equipment.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Rule-based joint fuzzy and probabilistic networks

One of the important challenges in Graphical models is the problem of dealing with the uncertainties in the problem. Among graphical networks, fuzzy cognitive map is only capable of modeling fuzzy uncertainty and the Bayesian network is only capable of modeling probabilistic uncertainty. In many real issues, we are faced with both fuzzy and probabilistic uncertainties. In these cases, the propo...

متن کامل

Audio-Visual Event Recognition with Graphical Models

In this work, different applications for the automated detection of events have been investigated utilizing audio-visual pattern recognition methods. The recorded data has been taken both from video surveillance or video conferences. Acoustic, visual and semantic features are extracted from the available data and are subsequently analysed with the help of graphical models. These are particularl...

متن کامل

Audiovisual Information Fusion in Human-Computer Interfaces and Intelligent Environments: A Survey

Microphones and cameras have been extensively used to observe and detect human activity and to facilitate natural modes of interaction between humans and intelligent systems. Human brain processes the audio and video modalities extracting complementary and robust information from them. Intelligent systems with audio-visual sensors should be capable of achieving similar goals. The audio-visual i...

متن کامل

Statistical and Information-Theoretic Methods for Self-Organization and Fusion of Multimodal, Networked Sensors

The appeal of distributed sensing and computation is matched by the formidable challenges it presents in terms of estimation and communication. Applications range from military surveillance to collaborative office environments. Despite the attractiveness of exploiting networks of lowpower and low-cost sensors, how to do so is a difficult problem. In this paper, we adopt a statistical viewpoint ...

متن کامل

Harmonium Models for Video Classification

Accurate and efficient video classification demands the fusion of multimodal information and the use of intermediate representations. Combining the two ideas into one framework, we propose a series of probabilistic models for video representation and classification using intermediate semantic representations derived from multimodal features of video. On the basis of a class of bipartite undirec...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002